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ABSTRACT 

This paper studies prediction of future failure (rates) by hierarchical empirical Bayes (EB) Poisson 
regression methodologies. Both a gamma distributed superpopulation as well as a more robust 
(long-tailed) log student-t superpopulation arc considered. Simulation results are reported 
concerning predicted Poisson rates. The results tentatively suggest that a hierarchical model with 
gamma superpopulation can effectively adapt to data coming from a log-Studcnt-t superpopulation 
particularly if the additional computation involved with estimation for the log-Student-t 
hierarchical model is burdensome. 



1. INTRODUCTION 

The following model often provides a useful place from which to 
commence the analysis of point event process data. First, suppose there is a 
set of I entities or units, each of which generates an observed history of point 
events. Take each describing point process to be homogeneous Poisson (X,), i 
= 1, 2, 3, ..., I. The observed data appears as (sj,tj), Sj being the number of 
events for process i over active or operating time tj. Also observed are certain 
fixed explanatory variable values; Xjj, j = 1, 2, ..., p, associated with A.;. In some 
literature, e.g., Everitt (1984), such variables are called manifest. Second, 
there is a latent quantity, 8j, associated with Xj, that is unobservable but 
influences Xj behavior. It is convenient to view 8j, at least provisionally, as 
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being drawn randomly from some superpopulation of values and held fixed 
thereafter, thus endowing X, with its own particular individuality. 

We call such a setup hierarchical, and ask it to furnish insights and 
numbers concerning (a) the individual rate values, Xj, (b) the influence of the 
explanatory variables upon these rates, and (c) the nature of the 
superpopulation that gives rise to the latent variable values; future values of 
the rates, e.g., A.j+ 1 , etc., may be viewed as coming from such a population, at 
least to a first approximation. 

The above model suggests itself for many purposes, one in particular 
being in risk analysis, e.g., of nuclear power plant safety systems. Such setups 
are also natural in other reliability-related areas as well, particularly in ones 
arising in the military. Application may perhaps be made to data reflecting 
"human unreliability," i.e., the propensity of different individuals to make 
errors, or experience accidents. 

The purpose of this paper is to describe methods for fitting various 
hierarchical models to the type of data described. Particular attention is 
devoted to the prediction problem: given the past record of an individual 
item (e.g., human being), how well can one predict its (her) future 
performance, even if some basic conditions change? 



2. THE FORMAL MODEL 

The formulation proposed can be written as follows: for i = 1,2, ..., I, and 

e=(pi,p2,.~, Pp) T 

5j ~1 1 D g(;ft) 

Xj = KxiMi), (2.1) 

Sj | A.j,tj~Ind. Poiss(Xjtj); 



ft = (0i, 02 , ..., 0 r ) being a parameter identifying g, the density associated with 
the assumed fixed superpopulation. In what follows we concentrate on 
certain parametric forms for the link function f and the superpopulation g, 
and aim at estimating the ft-value best representing the superpopulation 
giving rise to the apparent X-values. For various reasons, convenience and 
tradition being influential, we restrict attention to the log-linear model 



Xj = f(xi&5i) = exp(xjft+6j). (2.2) 

As suggested earlier, the objectives of the analysis will be several-fold, but an 
important one will be to estimate an individual Xj-value, i.e., the actual 
realization of Xj that prevails. An even more important objective is to predict 
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the number of future events associated with i, Sj(t). This entails finding an 

A A 

estimate {L and one for the individualizing parameter 8i, namely 5j. 
Estimation will be carried out by assuming that g(-,d), the superpopulation 
density giving rise to 5i, is one of a specific parametric family, and first 
estimating the parameters of that density along with the regression 

A A 

parameters. At a later stage, the estimated parameters 0 and Q_ are utilized to 
create estimates of 5i, and finally ^i, see Cox and Hinkley (1974), p. 401, Morris 
(1983), Deely and Lindley (1981), etc. The several-stage or hierarchical analysis 
is referred to as parametric empirical Bayes (PEB). 

This work is an extension of Gaver and O'Muircheartaigh (1987) in which 
discrepancy-tolerant (robust) estimates of 5j and were produced and 
evaluated without consideration of explanatory variables. The major 
purpose of the present article is to consider the effect of explanatory variables 
in the context of hierarchical models using quite different models for 
superpopulations: first, the simple conjugate Gamma, and next the log- 
Student t with a small number of degrees of freedom so that tails are 
extended, and outliers more apt to be generated. 



3. AN EMPIRICAL BAYES APPROACH 

The approach taken to providing estimates is traditional; see Berger (1985, 
Chap. 4). We first remove the condition on 5 for each item to obtain the 
unconditional likelihood 



L(CT,£;s,f) = flj^ 









(3.1) 



A A 

The latter is then maximized with respect to Ji and f) to produce 0 and Q.. 
These quantities are then inserted into the expression for the posterior 
density of 6, 



g,(« ) * g,{S;e.s. ■ = Ke (/(si, «)(. f 

where the constant Kj is a normalizing factor. A point estimator of is taken 
to be the posterior mean (other options of course available). 



i,=jf{l£,5)g r (5)d5 
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where X , is the value of the explanatory variable for conditions anticipated 

when the estimate is to be applied; if x { = X { then we have A ( , an empirical 
Bayes estimate of Aj, for conditions under which the data were taken; this will 
often be a shrunken estimator that has a smaller mean-squared error than 

does a simple individual estimator. If x t refers to other (e.g., future) 

conditions then A, calculated by (3.2) may be called the mean predictive rate. 
If more information is desired then the entire predictive distribution is 

needed: 




j- lSr (S)dS' 



(3.3) 



this approximates the conditional probability of S, future events for item i, 

given that it is exposed for time t , and under conditions £,. 

It is apparent that the approximation so obtained may be under-variable, 

in that it treats and 6 as fixed and known in (3.2) and (3.3). The 

hierarchical Bayes analysis described by Berger (1985, Chap. 4) is a substitute 
that avoids that criticism. This defect is undeniable, but some appreciation 
for the magnitude of the effect can be obtained by bootstrapping. Of more 
concern to us has been investigation of the effect of superpopulation model 
choice : how different can actual prediction be in simple situations modeled 
quite differently? We proceed to compare and contrast two models, one 
conjugate Gamma and the other longer-tailed and hence outlier-prone. 



4. GAMMA LATENT VARIABLE POISSON REGRESSION (GALVPR). 

It is conventional and convenient to invoke the gamma density to 
represent the random effect in (2.1); see Lawless (1987 a,b) and Anscombe 
(1950) for examples. Thus 



S (u; a ) = e-“'“ {u/ f 

rf" ') ‘ 



is the superpopulation model, from which 

E[8] = l 

Var[8] = a. 



(4.1) 



(4.2) 



4 



Lawless (1987b) gives expressions for the ln-likelihood and its derivatives for 
this hierarchical model. It turns out, however, that a more satisfying 
parameterization is in terms of 9=ln a when the mle stage is undertaken. 
Since a one-parameter gamma density is used, the regression has a constant 
term; that is xn =1. For convenience we provide expressions for the ln- 
likelihood and its derivatives using our parameterization. 

In the present parameterization, then, 

A, = U exp[xjJ3] (4.3) 

so exp(S) = U is gamma. In order to form the likelihood element in (3.1) it is 
only necessary to integrate to obtain the explicit form 



L(0,£;s,() = n 



r ( s . 



+ e 



e e c,t, 



W s,!r(c- e ) U e c,f, + 1 



e 6 c,t, + 1 



(4.4) 



where c, = expj 


The log-likelihood 


is 


1 


Ysf-1 ^ 








Iln(l + /f®) 


- In s,- !+ Sj 


In c,t, - ln^ql, + ljj - e~ 6 ln(e 0 Cjf,- + lj 


i=i 


.1 /=° J 







s. -1 



where ^ln(l + j — 1 if s, — 0. 

The following derivatives can be obtained. 



— = e a y 



l 



($-1 j ' 



IV 



+ e~ 2d In 



C^Citj + 1 



? f£L -[s,+^ e ] 

V ,+' 1 J 



(4.6) 



de 2 de I if? 



, f(V. ( j ^ 



I 

)=0 



1 + je° 



2e~ 30 \n\e\t, + ll + 2-^- + 

1 ' ‘ J e e c.t. + 1 



-20 ( 



rt, 



K e c,t t + 1, 



[ s ‘ +e °]J’ 

(4.7) 



dl 

dPk 



- 1 



1 = 1 



s, ~ cj, 

e e c.t .+ 1 



(4.8) 
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(4.9) 



c) 2 J 

dfi k dpj 






= -I 

i-l 



(e°Cjtj + lj 



2 x ij x ik 



where the summations involving Si -1 are set equal to zero when Sj = 0 . 
Further, 



E 



’ -d 2 l 


I 

_ Y 


( c,t, ) 


dWfl, 


i=i 


k e 6 C,t , + 1; 



A- 



(4.10) 



A Newton-like iterative procedure is used to solve the system of 
equations 



dl 



dd 



(4.11) 



dl 






(4.12) 



If Sj is large then evaluating the sums appearing in (4.5), (4.6), (4.7), (4.13) 
and elsewhere tends to be time-consuming. However all such sums are well- 
behaved (of monotonic formations) and can be well-approximated by 
integrals. This feature is not, but easily can be, included in our programs. 

If (Pk) were known, then a Newton procedure to estimate 0 would be to 
recursively solve the linear equation 



dl _ dl 


f & 1 


\ 


dd dd 


e = d 0 + {dd 2 


o 

ct> 

II 



( 0 - 0 °) 



(4.12) 



n 31 

where 0 U is a current estimate of 0. Note that if = 0, then 



dh 

de 



j = g(0) = e 



2 e 



r Vi ( , ^ 
,=o 



1 + je 



- 2e + l| + 



2 C{t x e 



-26 ( 



e°c l t l + 1 



CiU 



e u c x t x + 1 



(4.13) 



Hence, (4.12) can be rewritten as 



6 



/ 

!• 

7=0 



0 - 0 ° = 



^ c- _1 / 



I 

z 

1=1 



s,-l 



y _ 1 
/To^ 6 



Si - 1 

+ ;e t 



A 



+ e 



-20, JO 



In *,'(,•+] — ^ 






e qtj + 1 



s, + e 



+ 2e 30 Inf e 0 c,7,- + ll- 



2 c,t,e 



-20 



c°c l t l + 1 



Cih 



y 



e qtj + 1 



s,- + e 



-i-i 



(4.14) 

To obtain an initial estimate of 0, note that, letting Nj(tj) denote the i ,h 
random variable of the number of observed events, 

(4.15) 

Var[N,(t l )] = c l t l [l + c,t l e e }. (4.16) 



Thus, [(N j(tj) - qt,)/ has mean 0 and variance [1 + Cit;e e ]. We propose 
starting the iterative procedure to find 0 by computing 



m 



= y£ 

1 i=i 



s, 



■c,t, 



(4.17) 



If m < then a log-linear model is used to describe the data. If in > 1, then 
the initial estimate of 0 is 



e 0 = In 



(m-1) 



yX c . f . 

_ 1 i=i 



(4.18) 



If 0 were known, then (p^l could be estimated with generalized linear 
model software in the following manner, (cf. McCullagh and Nelder [1983]). 



A Newton iteration to solve the equations 0 = is to solve the system of 

°Pk 

equations 



0 = 



dl 



ay 2 = 2 



+1 

;=i 



d 2 l 









(4.19) 



where Ji° is the current estimate of Jl. 
Put 
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w. = 



Cj, 2 



where C; = exp(xij}°). 

Equation (4.19) can be rewritten as 




k=l, ..., p 
where 





and 



(4.20) 



(4.21) 



(4.22) 



= w t X ir 



(4.23) 



The equations of (4.21) are the normal equations for a least squares regression. 
The following is an iterative procedure to obtain estimates of 0 and (PjJ 
a) Fit a log-linear model stopping after one iteration 
1. Start with 



with 



£ f /J° = In 

Z Solve the equation (4.21) 



s i + \ I / l i 



W. = 



1 

S, H — 

2 



u i; = w,x t/ ; 



y, = ~v>i 



V 

v w .y 



+w,x,e o . 



b) Find the initial estimate of 0 by evaluating (4.18). If m<l, use the log- 
linear model of a) to describe the data. 

I. Next estimate (Pk)- Evaluate and solve equations (4.20) - (4.23). 
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II. Next estimate 0: Evaluate and solve equation (4.14). 

III. Continue alternating between I and II until convergence. 

In the simulation experiments described in Section 7 the above-obtained 
estimate of 0 occasionally either cycled among negative values or became 
very large and positive. In these cases II was replaced by a search of the 
marginal likelihood for 0 with fixed {(5^). 

5. ROBUST HIERARCHICAL POISSON REGRESSION (ROLVPR): THE 

LOG-STUDENT t SUPERPOPULATION 

As an alternative to the GALVPR model, allow 5 to have the Student t 
density 

g(8;r 2 ,d) = dJ± . (5.D 

(l + 5 / t 2 ) 2 

this distribution is adjustably longer-tailed than is the log-Gamma 
distribution (for 8) of the previous model, and hence better represents 
outliers and extreme extra-Poisson variability. The parameter d is the 
"degrees of freedom" for the Student t; for the present purpose a low value of 
d (e.g., d = 3-5) is useful. The Student-t model for log failure rate was 
introduced in Gaver and O'Muircheartaigh (1987). There it was pointed out 
that the marginalization step of (3.1) could be performed using Gauss- 
Hermite numerical integration; see Naylor and Smith (1982). In this paper 
we employ a variant of the Gauss-Hermite technique that involves an initial 
correction by Laplace's method. 

The procedure currently adopted for fitting the regression parameters (i 
in addition to the Student t parameter x proceeds iteratively: first explain as 
much item-to-item variability as possible by suitably weighted regression, 
then alter the model to approximately adjust for regression effects and apply 
the methodology of Gaver and O'Muircheartaigh (1987) to estimate x 2 . This 
value then provides refined weights for a new regression. We speak of 
rocking back and forth between the regression and latent variable stages. 

5.1. Rocking Algorithm when 8; - Student (p, x, d) 

Here is how the above procedure operates when latent variables are 
Student t so as to represent adjustably long-tailed outlier-prone regressions; d 
> 1 is a tuning parameter with Var[8] = x 2 d/ (d-2) if d > 2. 

a) Regress y j(l )= x/s; ln(sj/tj) on Vsi x,; (5.1) 
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Replace Sj / 1, = O/tj by l/3tj. Obtain 0(1). 

b) In the ith likelihood component obtained by integrating out with 
respect to the 5j-distribution, 

^ — sir \ dz (5.2) 

[i+(z v^)]- t 

where 

lnA,(z) = XtQ + z, (5.3) 

replace tj by / i e ?,/9<1) = f,(l). Now numerically optimize (5.2) by choice of 

T=*(2);t A (l) is a moment estimator. Details of the likelihood integral 
approximation and optimizations are furnished in Section 6. 



c) Regress y.(2) = 






-l 
\ 2 



In (s./fj) on 






i + e(2) 



-l 

AT 



V s , 



- 2 






where t; and s, are the original data values. Obtain /3(2). 



d) In step b above replace tj by f i e i,fl<2) and by x^(2) and again 
numerically optimize to find i{ 3). 

e) Return to step c) with i 2 ( 3). 



0 



Continue to convergence of 






The above procedure converges rapidly in our experience, giving results 
in close agreement with the simultaneous optimization of the likelihood 
with respect to x and 0. The latter is a much more computationally 
demanding procedure than is rocking. 



6. LIKELIHOOD COMPUTATION 

An essential part of the preceding algorithm is the numerical evaluation 
of likelihoods of this form: 

1=1 

where 
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(6.2) 




[*.(*)! 



c(r 




[l + z 2 / r 2 d| 



= re-^dz. 

J -oo 



Under the log-linear model 



lnA,(z) = X'Q + z 



(6.3) 



so 



Q,(z) = A.-fz)*,- - s,- In A.(z) + ^-ln[l + z 2 / r 2 d] + lnr, (6.4) 



omitting irrelevant constants. In order to evaluate the integral in (6.2) 
approximately but reasonably accurately vve apply either (a) a version of 
Laplace's method, in which Qj(z) is approximated by a quadratic and 
integrated explicitly; alternatively (b) apply a refined version of (a) involving 
Gauss-Hermite integration of the error resulting from the quadratic 
approximation to (6.3). Here is a sketch of the process. In what follows we 
will modify the time to be tie?*. 

6.1. Laplace Method, and a Refinement. 

To compute L, = J e~ Q, ^dz, the ith likelihood component, we begin by 

approximating Qj by a quadratic as follows. Since \j = e 2 , tj is modified as 
indicated above, and 



-Q,(z) = -X,t, + s,lnA, -^y^-ln(l + z 2 / x 2 d)-\nr, 




(6.5) 



= -e z t, + s, - w(z)z / r 2 



where 




is considered to be a weight. Now -dQi/dz-0 entails the equation 
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e 



(6.6) 







f d+1 T 


1 


z 

c _ 1 _ 


d 


t, 


' T 2 


[l + z 2 / r 2 d] 









Equation (6.6) may have two solutions. We obtain a single reasonable 
approximate solution to (6.6) as follows: An initial solution to (6.6) is 



2 ,(0) = ln(s, / /,) if s,- >0, 
2,(0) = ln(l/3f,) if s, = 0; 



(6.7) 



other replacements for the zero count situation are possible. 
Let z, be the solution to the equation 



eZ= r[ s i- zu ’( z ,(°))^ 2 ] 

after one Newton-Raphson iteration starting at Zj(0). 

Next evaluate an approximation to Q, (z^). First approximate 

Qi{ z ) = e Zt i-S,+w{Zi)-?T- (6.8) 

T 

Hence, 

Q l (z) = e z t,+w{z i )\. (6.9) 

T 



Finally approximate e z ‘ by Sj/tj resulting in the approximation 

Qife)“ s . + w ife)“T- (6-10) 

T 

Write 

Q,( 2 ) = Q.( 2 .) + ( 2 - 2 .) 2 |q."( 2 .)+ R .( z ); (6.ii) 

Laplace's method assumes that R,(z) is negligible and hence 
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( 6 . 12 ) 



so the log-likelihood 




(6.13) 



which can be numerically optimized by choice of x 2 for fixed tuning constant 
value d (in principle optimization on d can also be included). 

Improved numerical results have been achieved by writing 



Rj(u>) being defined by (6.11). The integration is then performed by Gauss- 
Hermite technique, i.e., by replacing the integral by a finite sum at points w,- 
determined by zeros of the Hermite polynomials; see Abramowitz and Stegun 
(1964). Experience has shown that the above produce numerical results that 
agree well with other numerical methods such as that of Naylor and Smith 
(1982); however, the unadorned Laplace, (6.12), may sometimes be satisfactory, 
and is certainly more quickly computed, which is a virtue if bootstrapping is 
undertaken. 

Alternative computational procedures exist and have virtues. The 
Newton-like iteration applied to the Gamma model of Section 4 can be 
adapted to the log-Student model, but we have not undertaken this as yet. A 
sampling-based approach of Gelfand and Smith (1988) is a natural option, but 
at present appears unnecessarily computer-intensive. As will appear, even 
the apparently crude rocking approximation proposed leads to interesting 
contrasts between predictions made by the conventional conjugate Gamma 
and the robustifying Student. 




(6.14) 




with 




(6.15) 
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7. NUMERICAL ILLUSTRATIONS 

In order to illustrate the performance of the two proposed prediction 
schemes we have performed extensive simulations. These illustrate the 
anticipated comparative performance of GALVPR and ROLVPR: the latter is 
often better able to adapt to the appearance of large outlier rates by refusing to 
shrink them down as extensively as do the former. The difference between 
the predictions made by the two schemes is less noticeable for small rates; 
here the behavior of the gamma-based approach, GALVPR, may actually be 
superior, probably because of the approximations made when implementing 
the Student ROLVPR model. Improvements in the current procedure for 
fitting the latter, e.g., when counts are zero, are likely to show up as reduced 
upward shrinkages. 

Simulation Experiment 

The present simulations are all based on a group of 1=20 items. For the 
log-linear rate of (2.2) x$ = Pi + p2*i, and Xj = +1 for i = 1, 2, ..., 10, x; = -1 for i = 
11, 12, ..., 20. In addition pi =0.5 and p 2 = 0.1 and 0.3, while tj = 2 throughout. 
For each experiment 20 Poisson rates were then generated from the Student 
model with x = 1.0 and d = 5, and for each rate a single Poisson data point was 
generated with mean A.jtj. These then constitute the observed counts from 
which predictions are made. Each prediction is viewed as a point estimate of 
the underlying Poisson mean giving rise to the corresponding observed 
count; it is a natural point estimate for a future count. The predictions are 
chosen to be the means of the posterior distributions from the GALVPR and 
ROLVPR model specifications, where each model is fitted to the data (20 
counts, plus values of Xj) for the particular experiment, meaning that Pi, i = 1, 
2, and a, for GALVPR, and x 2 , ROLVPR were estimated as described earlier. 
These models were actually fitted by two methods: (a) to all count data in the 
experiment, including that for the item whose rate is predicted, and (b) to all 
data, but omitting the observation for the item to be predicted, i.e., in cross- 
validation mode. 

An illustration of a particular experimental outcome, and the 
corresponding predictions appears in Table I. Note that for this particular 
data set the average mean square error of ROLVPR no-cross-validation 
predictions is the smallest. This is not always so; see the figures for 
comparisons of mean-squared-errors for the two shrunken predictions, and 
raw predictions. 
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Number 



TABLE I 

SAMPLE COMPARISON OF RATES AND ESTIMATES 
Pl=0.5,P2 = 0.1,T = l,d = 5 

Co- Ob- True Raw No Cross- 
variate served validation 



Cross-validation 



i 




Si 


Ai 


Xi(raw) 


Xj(GA) 


Xi(Stu.t) 


Xi(GA) 


Xi(Stu.t) 


i 


+i 




152 


87.57 


76.00 


74.56 


76.28 


53.56 


73.68 


2 


+1 




3 


1.44 


1.50 


1.66 


1.62 


1.66 


1.66 


3 


+1 




2 


0.47 


1.00 


1.17 


1.23 


1 17 


1.26 


4 


+ 1 




2 


3.51 


1.00 


1.17 


1.23 


1.17 


1.26 


5 


+i 




0 


0.65 


0.00 


0.19 


0.47 


0.21 


0.52 


6 


+1 




0 


0.16 


0.00 


0.19 


0.47 


0.21 


0.52 


7 


+i 




5 


2.07 


2.50 


2.64 


2.45 


2.63 


2.48 


8 


+ i 




3 


2.07 


1.50 


1.66 


1.62 


1.66 


1.66 


9 


+i 




8 


1.20 


4.00 


4.10 


3.76 


4.10 


3.77 


10 


+1 




1 


1.11 


0.50 


0.68 


0.85 


0.68 


0.86 


11 


-l 




9 


2.87 


4.50 


4.31 


4.15 


4.28 


4.16 


12 


-1 




2 


0.80 


1.00 


1.10 


1.19 


1.10 


1.19 


13 


-1 




3 


0.43 


1.50 


1.56 


1.59 


1.56 


1.57 


14 


-1 




0 


1.22 


0.00 


0.18 


0.46 


0.19 


0.52 


15 


-l 




9 


5.44 


4.50 


4.31 


4.15 


4.28 


4.16 


16 


1 




7 


3.18 


3.50 


3.40 


3.26 


3.38 


3.27 


17 


-1 




11 


5.40 


5.50 


5.23 


5.10 


5.16 


5.08 


18 


-l 




0 


0.40 


0.00 


0.18 


0.46 


0.19 


0.52 


19 


-1 




3 


0.66 


1.50 


1.56 


1.59 


1.56 


1.57 


20 


-l 




0 


0.03 


0.00 


0.18 


0.46 


0.19 


0.52 










MSE's: 


7.83 


9.16 


7.34 


58.94 


10.61 












TABLE D 












SAMPLE COMPARISON OF RATES AND ESTIMATES 












Pl = 0.5, P 2 = 0.1, T = 1, d = 


5 






Observed 




True 




Raw 




No Cross- 
validation 


Cross-validation 


Si 




Ai 




Xj(raw) 




Xi(GA) 


Xj(Stu.t) 


Xi(GA) 


Xj(Stu.t) 



267 


138.43 


133.50 


132.11 


137.56 


101.48 


130.66 


0 


0.01 


0.00 


0.17 


0.48 


0.19 


0.51 


3 


1.35 


1.50 


1.65 


1.64 


1.65 


1.65 


12 


6.55 


6.00 


6.10 


5.62 


6.10 


5.59 


0 


0.29 


0.00 


0.17 


0.48 


0.19 


0.51 


2 


2.21 


1.00 


1.16 


1.25 


1.16 


1.25 


2 


1.32 


1.00 


1.16 


1.25 


1.16 


1.25 


5 


3.47 


2.50 


2.64 


2.46 


2.64 


2.48 


0 


0.57 


0.00 


0.17 


0.48 


0.19 


0.51 


2 


0.98 


1.00 


1.16 


1.25 


1.16 


1.23 


7 


2.18 


3.50 


3.52 


3.42 


3.52 


3.39 


3 


0.47 


1.50 


1.60 


1.71 


1.60 


1.71 


36 


20.25 


18.00 


17.41 


17.48 


16.91 


17.40 


4 


1.68 


1.00 


1.12 


1.31 


1.12 


132 


2 


3.72 


2.00 


2.08 


2.12 


2.08 


2.12 


0 


0.39 


0.00 


0.17 


0.51 


0.18 


0.58 


6 


2.28 


3.00 


3.04 


3.97 


3.04 


2.96 


2 


1.48 


1.00 


1.12 


1.31 


1.12 


1.32 


9 


2.40 


4.50 


4.58 


4.30 


4.48 


4.29 


10 


4.46 


5.00 


4.96 


4.76 


4.95 


4.77 




MSE's 


2.22 


2.90 


1.08 


69.51 


4.09 


NOTE: this is an 


independent 


experiment from the same setup as 


that of Table 1 
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USE(Comma) - MSE(Row role) MSE(Gommo) - MSE(Raw role) 



Table II provides another illustration of the estimates' performance, this time 
with fewer extreme outliers. Figure 1 exhibits histographs of the mean- 
squared error difference, MSE(ROLVPR)-MSE(GALVPR), for 100 replications 
of the above specific simulation. Note that the advantage is in favor of 
ROLVPR in the majority of the experiments, with an exceptional advantage 
displayed in some cases. Often it is in a few cases of exceptionally large rates, 
and counts, that ROLVPR excels. Figure 2 compares the mean-squared errors 
of each shrunken no-cross validated estimator with the corresponding mean 
square error of the raw-rate estimators; the raw rate estimator is simply the 
count divided by tj = 2. For these data sets the indications are that ROLVPR 
improves upon RAW most of the time when 02=0-1 and when 02=0.3 
(although less decisively), while RAW improves upon GALVPR most of the 
time; neither victory is decisive. These results are perhaps not surprising 
when one refers to Morris (1983), Theorem 1 and subsequent discussion. It 
appears that the convenient conjugate can adapt to non-gamma data quite 
well in many of the present cases at least. 

An undoubted disadvantage of the ROLVPR procedure is its computer 
intensivity: computation of its estimators requires far more time than does 
GALVPR because a root must be found, (6.6), and a numerical integration 
performed. Search is on for a more tractable representation of a "robust g" 
that permits analytical rather than computational evaluation. The inverse 
Gaussian is a candidate; see Dean et al., (1989). Conceivably such an adoption 
will result in better results for small-rate situations. Needless to say RAW, 
which quotes A.j(RAW) = Sj/tj is by far the most economical. Of course it may 
not be used if the covariate value, Xj, changes. 
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